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Abstract 

The numerical construction of polynomials in the product representa- 
tion (as used for instance in variants of the multiboson technique) can 
become problematic if rounding errors induce an imprecise or even unsta- 
ble evaluation of the polynomial. We give criteria to quantify the effects 
of these rounding errors on the computation of polynomials approximating 
the function 1/s. We consider polynomials both in a real variable s and 
in a Hermitian matrix. By investigating several ordering schemes for the 
monomials of these polynomials, we finally demonstrate that there exist 
orderings of the monomials that keep rounding errors at a tolerable level. 
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1 Introduction 



In Monte Carlo simulations of fermionic systems in a discretized space-time, the 
determinant of a matrix A, related to the lattice action of the fermions, usually has 
to be taken into account. Although the form of the matrix A can be very general 
and depends on the problem under consideration, we assume in the following that 
A defines a local action of the fermions extending only over a few lattice spacings. 
A standard way to incorporate detv4 into simulation algorithms is to write it as 
a Gaussian integral 

detA = J V&V^e-^^ 1 * (1) 

where $ is a suitable complex iV-component vector on which the matrix A acts 
and T>&T><& the corresponding (properly normalized) integration measure. One 
is therefore led to the inconvenient problem of inverting an N <g> N matrix, which 
can be very large, i.e. N being O(10 6 ). 

In |l| an alternative approach was introduced: the determinant of A may be 
approximated by the inverse determinant of a polynomial P n (A) of degree n in 
the matrix A such that 

detA « [detP n (A)] _1 . (2) 

In eq. (0) and throughout the rest of the paper we assume that A is Hermitian, 
positive definite, and that \\A\\ < 1, where \\A\\ is given by the largest eigenvalue 
Amax(A) of A. In the following, we will consider only polynomials of even degree 
n. The roots Zk, k = 1, ...,n, of the polynomial hence come in complex-conjugate 
pairs and the determinant detP n can be factorized into positive factors, resulting 
in 

n/2 n/2 

[detP^A)]" 1 oc J] [\det{A - z k )\ 2 ] oc J[ / X?$tp$ fce -£^ (A - 2fc)t(A - Zfc) * fc . 

k=l k=l J 

(3) 

From eq. @, we can see that now the action of the bosonic fields is local and 
hence the task of inverting the matrix A can be avoided. Similar steps lead to 
the so-called multiboson technique for simulating fermionic systems. This tech- 
nique has been shown to give a comparable performance [2-8] to the standard 
Hybrid Monte Carlo (HMC) M or Kramers equation [Tt], [TI|] simulation algo- 
rithms. Examples for applications of the multiboson technique are Monte Carlo 
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simulations of lattice QCD || ||, supersymmetry |12| , the Schwinger model 
and the Hubbard model [Ol. 



In exact versions of the multiboson technique |4| or related approaches [[LJ, |1| [U| 
the use of a product representation of the polynomial P n (A) often turns out to 
be convenient. However, the numerical construction of a polynomial using the 
product representation can -because of rounding errors- easily lead to a loss 
of precision or even to numerical instabilities. This holds true in particular if 
computers with 32-bit arithmetic precision are used. Motivated by simulation 
algorithm studies, which use the product representation of a polynomial, we will 
show in this paper that by suitable orderings of the monomial factors, the preci- 
sion losses can be kept on a tolerable level. Although we will only demonstrate 
this for a particular example of the matrix A, we expect that similar results may 
also hold for more general situations. 

The paper is organized as follows: in section 2 we will specify the polynomial we 
have used in this work. Effects and possible origins of rounding errors when the 
polynomial is evaluated in its product representation are discussed. In section 3 
we give the ordering schemes for the monomials of the polynomial in the product 
representation. We show how the evaluation of the polynomial is affected when 
the different ordering schemes are employed. Section 4 is devoted to quantitative 
estimates of the rounding-error effects. We compare in section 5 some results 
when using 32-bit and 64-bit arithmetics and conclude in section 6. 



2 Product representations of polynomials and 
rounding errors 

Let us consider the approximation of a function f(s) depending on a real variable 
s > by a polynomial P n (s) of degree n. The motivation to initially study a 
single degree of freedom, is -besides its simplicity- that we might think of the 
matrix A as being diagonalized. Then the problem, eq. (|2|), reduces to finding 
a polynomial that approximates each A -1 (A) separately, where X(A) is a real 
eigenvalue of A. We therefore expect that studying a single degree of freedom 
can provide information also about the qualitative behaviour of rounding-error 
effects when the polynomial P ri {A) in the matrix A is computed. 
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To be specific, we follow the Chebyshev approximation method [17|, ^(J and con- 
struct a polynomial approximating the function f(s) = 1/s in the range e < s < 1, 
where e > is an adjustable parameter. The choice of the function 1/s is, of 
course, motivated by our original problem of evaluating an inverse matrix, eq. ([!]). 



We take the same polynomial -P n ,e(s) as specified in |18fl . For completeness, we 



recall here its definition. If we introduce the scaled variables u and 9 

u = ( s _ e )/(i_e) , cos6 = 2u-l (4) 

the Chebyshev polynomial T*(u) of degree r is given by (note that T* is not the 
standard definition of the Chebyshev polynomial): 

T*(u) = cos(r#) . (5) 

For given n and e the polynomial P nje (s) is then defined by 



p 

- 1 - n. 



l + pT: +1 {u) /s, (6) 



where the constant p has to be taken such that the square bracket vanishes at 
s = 0. The polynomial eq. (||) approximates the function 1/s uniformly in the 
interval e < s < 1. The relative fit error 

R n , e (s) = [P n , e (s)-l/s]s (7) 

in this interval decreases exponentially with increasing n 

< * = * (j^^J + ■ (8) 

The accuracy parameter S provides an upper bound for the absolute value of the 
relative fit error in the given interval. We note in passing that \p\ < 5 and that in 
all the practical applications studied in this paper the value of p is actually very 
close to the value of 5. 

The roots Zk of the polynomial eq. (^) can be computed analytically, 

= 5 (1 + <) - j(l + <) COS ^_ j - ^sm ^_ j . (9) 
We then obtain the desired product representation of the polynomial 

n 

Pn,ei s ) = Ul C k( s -Zk)} ■ (10) 

k=l 
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The real normalization factors Ck have to satisfy the condition 

(11} 



n ( 1 i , n - - - \ 1 

n*=c«(») = Hr n 

fc=l \ Z k=l 



1 + e 

— ^ ^fc 



When the c^'s are taken to be all identical, they turn out to be 0(1). We will 
also use later the partial products 

00 = IIM*-**)]- ( 12 ) 

fe=i 



One may also be interested [16| in a product representation of the polynomial 



Pn,e( s ) — Pn,e(o) m terms of the real variable q = ±y/s: 

2n 

Pn,e(l) = II IVckil - r k)] , (13) 



k=l 

where the roots are defined as 



rk = \^k, hn(y/zk) > , k = 1, . . . ,n 

r k = r* 2n+l _ k , k = n+l,...,2n (14) 

with the obvious generalization for the constants Jck~. The representation of 



eq.(^) is used in the algorithm described in [15|, where it leads to a most efficient 
implementation of the so-called PHMC algorithm [ |16[| . 

The above definition of the polynomial p n ,e(q), and hence also P n)£ (s), can straight- 
forwardly be generalized to the case when, instead of the real variable q, a Her- 
mitian matrix is used. In this paper we will only study the special matrix Q and 
refer to appendix A for its definition. Here we only mention that Q has a band 
structure with only the diagonal and a few off-diagonals different from zero. We 
will use for our studies of rounding-error effects both forms of the polynomial, 
eq.(|I3) and eq.(fO|). We already remark at this point that no qualitative difference 
in our results could be seen using either of the two forms of the polynomial. 
In order to discuss the rounding errors occurring in the evaluation of the polyno- 
mial p U)6 either in a real variable or in a matrix, let us consider the quantities 

Ul{q) = \Vci(q - n) . . . y/c[{q - n)q\, 
toi(Q) = \\ Vq(Q -n) .. .^(Q -ri)Q<5> hi \\/ ll^inl 

Ze {1,2,..., 2n+l} , r 2n+1 = 0, c 2n+1 = l. (15) 
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In eqs. (fL5D , $i n is a suitable vector on which the matrices act. The constants 
r 2n+ i and C2 n +i are chosen such that the last factor in U2 n +i(q) and fi 2n +i(<3) 
corresponds to a multiplication by q and Q, respectively. 

In practice the values of subsequent products, such as u)i(q) and ui + i(q), may 
differ substantially. This can be clearly seen in Fig. 0a. For the figure we have 
chosen n = 64 and e = 0.0015, leading to a relative fit accuracy of 5 ~ 0.013. 
The values of n and e are motivated by the values used in practical applications 
such as simulations of lattice QCD. 

We plot u>i(q) for values of q at the low end of the interval, q 2 = e, the middle of 
the interval, q 2 = (1 + e)/2, and the upper end of the interval, q 2 = 1. We will 
restrict ourselves to positive values of q. Using the roots as given in eq.(fH|) and 
taking the factors y/ck all identical, this restriction induces no loss of generality 
because of the relation U[(q)uj n (— q) = qu n+ i(—q), I G [1, n] and q ^ 0. In the case 
considered in Fig. [TJa the oscillations of ui(q) do not lead to numerical overflows. 
Consequently, the final value cu 2ri+ i(g) comes out to be close to 1, with a deviation 
consistent with the value of the accuracy parameter 5. 

The observed behaviour of ui(q) leads us to expect that large precision losses 
may affect the evaluation of the same polynomial in the matrix Q, when using a 
product representation, as we do for the computation of Qi(Q), eg. fli~5|) . The l-th 
step of this computation, yielding as a result, amounts to the multiplication 
of the vector 

= v^rr(g - r,_i) • . . . • y^(Q - rx)g$ in (16) 

by the matrix y/c~i(Q — r/). In order to understand the relation between the 
quantities u>j(q) and the rounding errors on £lj(Q) = ||$j||/||$i n ||, it is useful to 
think of the vectors $j and $; n as linear combinations of the eigenvectors of Q: 

®j = T,(lb\^in)^(q b -r 3 )-. ■ .-Vc^qb-ri)\q b ) , (j = 1, 2, . . . , 2n+l) , (17) 

b 

where Q\qt) = qb\qb) and $; n is assumed to have projections generally non van- 
ishing on all the eigenvectors of Q. Let us now go back to situations like the one 
in Fig. |l]a where, for several integer values of I, the quantities u>i(q) in a given 
range of values of q turn out to be much larger than for all other values of q and 
they also change substantially as a function of /. It is clear that in such situa- 
tions the quantities £li(Q) must substantially change with /, too. In particular 
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the situation £li(Q) <C f^_i(Q) must occur for some values of I, if the correct 
final value Vt2 n +i{Q) — ||$in|| is going to be obtained. But such a situation can 
only be the result of substantial cancellations in the multiplication leading from 
to it is here that we expect the occurrence of large rounding errors, 
which then propagate through the whole computation. As we will see later, these 
anticipated precision losses actually occur when the polynomial in a matrix is 
evaluated through its product representation. 

The problem itself suggests, however, its solution: the monomial factors in eq.flTUD 
or eq.flTBD should be ordered, if possible, in such a way that the absolute values 
of all subsequent products of monomials in eq.(|T0D or eq.(|T3|) have the same 
order of magnitude. Of course, whenever possible, evaluating a polynomial in the 
product representation should be avoided, because in general numerically stable 



recursion relations [[u]] or other numerical recipes are available. However, 
for some cases, as discussed in the introduction, one has to rely on the product 
representation of the polynomial. 



3 Ordering schemes 

In this section we want to introduce the different ordering schemes, that we use 
for the monomials in eqs.(|l^) and (fl3|) , and the well-known, numerically stable 
Chebyshev method for the evaluation of general polynomials. Throughout this 
paper we will use the homogeneous distribution 

Ck = (Ctot)~ 

for the normalization constants. We remark that in principle one could try, at 
least for the case when a polynomial in a matrix is considered, to also distribute 
the normalization constants Cfc in a fc-dependent way to reduce rounding errors. 
However, as expected, we only found a very weak dependence of the rounding 
errors in the matrix case on the distribution of the c^'s. 

3.1 Definition of ordering schemes 

We start by defining ordering schemes for the monomial factors in eq. fliTf) , or 
equivalently the roots of eq.(|9|). 
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Naive ordering 

As naive we regard the ordering given by 

*F Sm = z k , fc = l,...,n, (18) 

where the roots z k , given in eq. (|9|), lie on an ellipse in the complex plane. In the 
naive ordering the roots are selected from this ellipse by starting at the origin and 
moving anti-clockwise. This is indicated in Fig. |2|a, where the roots are shown 
labelled according to the order in which they are used in the evaluation of the 
polynomial of eq. (|10D . Adopting this ordering of the roots for the construction 
of the polynomial in a matrix according to its product representation gives rise 
to substantial rounding-error effects. 

Pairing scheme 

A first improvement over the naive ordering is to use a simple pairing scheme, 
which amounts to reordering the roots as follows: 

r Pair _ naive U — A n 

Z k — Z j(k) 5 K, — X, . . . , . 

Let us give the reordering index j(k) for the example of n being a multiple of 8 
and n' — n/8. In the lower half-plane, Im z k < 0, the pairing scheme is achieved 
by 

r n n n 

J = i '2'4 '4' 

n n n 

2 -2- 1 '4 + 2 -4- 1 - 

'2 + '4 + '4 + J ( } 

and for Im z k > correspondingly. An illustration of the ordering in the pairing 
scheme is shown in Fig. |2|b. 

In the case where n/2 is not divisible by 4, we search for the next integer m, 
which is smaller than n/2 and divisible by 4. We then repeat the above described 
procedure on these m roots and simply multiply the remaining roots z m+ i ■ ■ ■ z n / 2 
at the end. 
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Subpolynomial scheme 

The problem of precision losses in evaluating a polynomial in a matrix according 

to its product representation becomes more severe in general when increasing the 

degree n and in the specific case of P nje (s), eq. @, also when decreasing e. In 

order to reduce the effects of rounding errors, one may therefore be guided by 

the following intuition. Let us consider the polynomial P n _ e (s) with roots z k and 

n ^> 1. If m is an integer divisor of n, the roots z 1 , zi +m , Zi +2 m, ■ ■ ■ , z i+(n/m-i)m 

turn out to be close to the roots characterising the polynomial P n ' )£ (s) of degree 

n' = n/m (note that we keep the same e). Moreover, the normalization constants 

j_ i 
Cfc = {Ctot{n)) n and c' k = (C to t[n )) 77 are of the same order (the dependence on 

n of Cfc turns out to be negligible for large n). Then the product 



is a rough approximation of P n / )e (s), \u(s) — P n ^ e (s)\ < 1 for all e < s < 1. The 
same argument may be repeated for the other similar sequences of roots, like 

^2+2mj • • • ^2+(n/m-l)ni] • • • > *mj ^2m; ^3m) • • • Zn- 

This means that the product eq.(|T0D may be split into a product of m subpoly- 
nomials, in such a way that each of them roughly approximates a polynomial 
Pn\e{s) of lower degree n' — n/m. Because of the lower degree of the subpolyno- 
mials given by products such as eq. (^0| ), one may expect that only small changes 
in the magnitude of the partial products occur in the intermediate steps of the 
evaluation of each of these subpolynomials. 
The reordering of the subpolynomial scheme 



n/m— 1 




n 




(20) 



3=0 



,sp naive 



k — 1, . . . , n 



can be represented by 





(21) 
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where m is an integer divisor of n. 

We found that m has to be chosen y/n in order to minimize the changes 

in the magnitude of the partial products occurring in the intermediate steps of 
the construction of P n ^(s). We remark that the naive ordering is reproduced by 
the two extreme choices m — 1 and m = n. 

Bit-reversal scheme 

The subpolynomial scheme can be generalized, leading to what we will call the 
bit-reversal scheme. To illustrate how this scheme works, let us assume that the 
degree n of the polynomial is a power of 2. One now starts with the n monomial 
factors in eq. (|10D , chooses m = n/2 and applies the subpolynomial scheme 
resulting in m binomial factors. We then proceed to choose an mf = m/2 and 
again applying the subpolynomial scheme to these m binomial factors which leaves 
us with m' subpolynomials each of degree 4. The procedure can be iterated until 
we are left with only one subpolynomial having the degree of the polynomial itself. 
The above sketched procedure can be realized in practice by first representing the 
integer label (counting from to n — 1) of the roots in the naive order by its bit 
representation. The desired order is then obtained by simply reversing the bits 
in this representation. The resulting reordering of the roots is shown in Fig. |2|c, 
with n = 16 as an example. 

For n not a power of 2, we pad with dummy roots, chosen to be zero for instance, 
until the artificial number of roots is a power of 2. The bit-reversal procedure 
can then be applied as described above. Afterwards, the dummy roots have to 
be eliminated from the sequence. 

Montvay's scheme 

Recently, Montvay [1E1 suggested to order the roots according to an optimization 
procedure that can be implemented numerically. Let us shortly sketch how Mont- 
vay's ordering scheme works and refer to for further details. Let us assume 
that we have already the optimized order of the roots for the partial product 
P^ e (s), eq. (|I~2|). Then the values of \sP^ e (s)(s — z)\ are computed for all z taken 
from the set of roots not already used. The values of s are taken from a large 
enough discrete set of points, {si, . . . , sjy}, which are all in the interval [e, 1]. 
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Now, the maximal ratio over s G {s\, . . . , sn} of all values \sP^ t (s)(s — z)\, i.e. 
max \sP l (s)(s-z)\/ min IsP^ (s)(s - z)\ , 

s£{s 1 ,...,s N \ se{si,...,s JV } 

is computed for each root z separately. Finally that root is taken which gives the 
lowest of these maximal ratios. Starting with the trivial polynomial P° e (s) = 1, 
this procedure obviously defines a scheme according to which the roots can be 
ordered iteratively. We show in Fig. |2|d the resulting order of the roots using 
Montvay's scheme by again labelling the roots in the order in which they are 
used to compute the polynomial of eq. (|i0|). It is clear that Montvay's ordering 
scheme implements by construction our intuitive criterion that the changes in the 
magnitude of subsequent partial products should be minimized. 

Ordering schemes for the roots {rj} 

The above discussion concerned the ordering of monomial factors in eq. (|T0|) . If 
one wants to use the product representation, eq. (|T3D , the ordering of the monomial 
factors can be obtained by first ordering the roots z k in one of the ways described 
above and then defining the In roots rj as in eq. (fH]) . The one exception is the case 
of Montvay's scheme, where the r^'s themselves have to be ordered, in full analogy 
with the procedure desribed above for the case of the product representation ( |T0| ) . 
We remark that the ordering schemes for {rj : j — 1, . . . , 2n} which we consider 
in the present paper, always satisfy the relation in the second line of eq. (|14D . 
Although there is in principle much more freedom, this restriction turns out to 
be highly convenient for the application of Monte Carlo simulation algorithms. 



3.2 Clenshaw recursion 

Any polynomial V m (u) of degree m in the real variable u, with u G [0, 1], can be 
expressed as a linear combination of Chebyshev polynomials T£(u) 

V m (u) = -a T*(u) + a k T*{u) 
z k=i 

where do, a% . . . , a m are suitable coefficients and the definition of T£{u) is given 
by eq. (H) and cos 6 = 2u — 1 . 



A way |17|, of computing V m is to use the formula 



V m (u) = ~(bo(u)-b 2 (u)) , (22) 
11 



where the right-hand side has to be evaluated through the recursion relation: 



b m +2(u) = b m+ i{u) = 

b k (u) = 2(2u - l)6 fc+ i(«) - b k+2 {u) + a k , (23) 

for all integers k starting from m down to 0. It is also possible to prove that 
the total rounding error on the final result (b (u) — &2(w))/2 cannot exceed the 
arithmetic sum of the rounding errors occurring in each step of fl2"3p . 
This well-known, numerically stable method [I71, [J20| can be applied to the eval- 



uation of P n ,e{s) as well as of sP n<e (s). In the latter case, if we consider sP nje (s) 
as a polynomial of degree n + 1 in u = (s — e)(l — e), its expression as a linear 
combination of Chebyshev polynomials in u can be read from eq.(^|): 

sP n>e (s)=T*(u) + pT: +1 (u). 

We can then evaluate sP nt€ (s) through the Clenshaw relation (|23"D by taking 
a = 2, a n+ i = p and all the other a k s vanishing. The Clenshaw recursion will 
serve us in the following as a reference procedure for the numerical evaluation of 
the polynomial Q 2 P n ,e{Q 2 )- 

3.3 Ordering schemes at work: a first look 



As discussed in section 2, the large oscillations of uji(q), eq. (|T5|), can easily lead 
to precision losses when constructing the same polynomial in a matrix. This can 
be seen in Fig. ||b for the case of the matrix Q (see Appendix A for the definition 
of the matrix Q). There we have considered an 8 3 • 16 lattice with gauge group 
SU(2), at (3 = 1.75, k = 0.165 and c sw = 0. We compare fli(Q) computed in 32- 
bit precision (solid line) with the one computed in 64-bit precision (dashed line). 
We remark at this point that, although matrix multiplications are performed in 
32- or 64-precision, scalar products are always evaluated with 64-bit precision. 
As starting vector $ in we use a Gaussian random vector. 

The picture confirms our expectations: as long as the values of Qi(Q) are growing 
with /, both curves are basically identical. When Qi(Q) starts to decrease, how- 
ever, the values for Qi(Q) obtained with 32-bit precision deviate strongly from 
the ones obtained with 64-bit precision. In fact, instead of decreasing, £li(Q) 
computed with the 32-bit precision version of the program, even increases and 
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eventually runs into a numerical overflow. This is no surprise, of course: as dis- 
cussed above, when Qi(Q) <C Qi-\(Q), large cancellations must occur, leading 
to the observed precision losses. Note, moreover, that even the values for Qi(Q) 
obtained with a 64-bit precision are affected by large rounding errors: the final 
value fi 2n +i(Q) is completely wrong, namely O(10 25 ) instead of 0(1), as it should 
be. Clearly, using only 64-bit precision reduces the precision losses (no overflow 
is observed), but it certainly is not sufficient to keep in general rounding errors 
on a tolerable level. 

Figure [3] shows how the different, improved ordering schemes help. In Fig.[3]a 
we plot u>i(q) for three ordering schemes, the subpolynomial (solid line), the bit- 
reversal (dashed line) and Montvay's scheme (dash-dotted line). We only show 
the curves for q = 1, i.e. the worst case in Fig.^a. For other values of q the 
picture looks very similar. As compared with Fig. [lja, the large oscillations are 
strongly suppressed. As a consequence, when now Qi(Q) is constructed with 32- 
bit precision, according to the improved ordering schemes, numerical overflows 
are avoided and the desired fit accuracy is reached, as demonstrated in Fig.[3|b. 

4 Quantitative tests 

After the qualitative tests of the ordering schemes discussed in the previous sec- 
tion, we would now like to turn to more quantitative results. Guided by the 
observation that the evaluation of a polynomial in a single variable gives infor- 
mation on the precision losses that may occur when evaluating the same polyno- 
mial in a matrix, we will first investigate single variable estimators for rounding 
errors. Then we will discuss the rounding-error effects that arise in the numerical 
construction of the polynomial Q 2 P n ,e{Q 2 )- The results of this section only refer 
to ordering schemes for polynomials in the product representation of eq.(|H]). We 
stress that there would be no qualitative difference in our results if the represen- 
tation eq.flllf) had been taken. 

4.1 Estimators of rounding errors for a single variable 

A possible way of defining single variable estimators for rounding-error effects 
consists in quantifying the magnitude of the oscillations of uii(q). As a first step, 
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let us evaluate for a given I the maximal and the minimal value of |P^ e (s)|, 
eq . flEf) , over the interval < s < 1. The ratio of the maximal to the minimal 
value, i.e. Ri = max se [ 0j i] \Pn e ( s )\/ mm se[o,i] \Pi t ( s )\y * s then a measure of how 
different the order of magnitude of the l-th partial product can be for different 
values of s. Building the maximum of Ri with respect to I, we get a quantity 
independent of / and s: 

j max sem \Pl t (s)\ \ 

Rmax = max < — ' \ . (24) 

ie{i,...,n} { rmn se[0 ,i] \P^ e {s)\ J 

It is clear that -R max has to be smaller than the largest representable number on a 
given computer to guarantee the stability of the evaluation of the full polynomial. 
This is actually sufficient to exclude the occurrence of overflows or underflows in 
the evaluation of the considered partial products. If -R max fulfils this condition 
but still assumes large values, in the case of a polynomial in a matrix a reliable 
numerical result cannot be expected, since rounding errors are likely to lead to a 
substantial loss of precision. Moreover, even if -R max is reasonably small, it is still 
possible that the quantity max se [ 0) i] \P^ e (s) \ shows large oscillations as a function 
of I. As a consequence, another quantity of interest is the maximum value of the 
partial products itself: 

M max = max \Pi >e (s)\- ( 25 ) 

s€[0,l],Ze{l,...,n} 

This again has to be smaller than the largest representable number in order not to 
run into overflow. Note that -R max and M max are computed for s G [0, 1], whereas 
the polynomial in eq.(|i~0D has a given relative fit accuracy only in the interval 
s G [e, 1] . However, as will be explicitly demonstrated below, our results for -R max 
and M max do not depend very much on the choice of the lower end of the interval. 
A final remark is that the values of -R max and M max should be taken only as a 
first indication for the size of the rounding errors. Since it is not guaranteed that 
in a practical case, for the polynomial in a matrix, the value of -R max or M max 
is actually assumed, it is very possible that -R max and M max may overestimate 
the rounding errors. On the other hand, since in the case of a polynomial in a 
matrix the rounding errors occurring in different intermediate steps of the nu- 
merical computation can easily accumulate, -R max and M max might also yield an 
underestimate. 
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In order to compute the values of -R max and M max we take 5000 values of s, equally 
spaced in the interval [0, 1]. We explicitly checked that the values of -R max do not 
depend very much on the lower end of the interval [s m i n , 1] from which s is taken. 
In Fig. H we show R max , in the case of the bit-reversal scheme, as a function of 
the lower end of the interval s m i n , measured in units of the parameter e. The data 
refer to polynomials of different degree, n = 30, n = 86 and n = 146, with the 
parameter e determined by the fixed relative fit accuracy 5 = 0.001. As Fig. [| 
shows, the dependence of -R max on s min is very weak. For the other ordering 
schemes, we find a similar behaviour of -R max as a function of s min . This justifies 
the use of s m i n = that we have adopted for the numerical tests described below. 
We start by comparing the subpolynomial and the bit-reversal schemes, as they 
are closely related to each other. In Fig. [5] we show -R max and M max as a function of 
the degree n of the polynomial P n ,e(s) of eq. (|I~0"|) , keeping the relative fit accuracy 
5 = 0.1 constant by adjusting the parameter e. For the subpolynomial scheme, 
the divisor m is chosen to be m ~ y/n. Figure [5] clearly confirms our expectation 
that the bit-reversal scheme, considered as a generalization of the subpolynomial 
ordering scheme, gives smaller values of -R max and M max . For degrees of the 
polynomial n > 40 the "dangerous" oscillations are substantially suppressed in 
the bit-reversal scheme compared with the subpolynomial scheme. 
In Fig. H we show the values of R max and M max for the bit reversal, the naive, the 
pairing and the Montvay's scheme, at a fixed value of 5 = 0.001, as a function 
of n. Numerical tests for the different ordering schemes were also performed at 
values of 5 = 0.1 and 5 = 0.01, yielding a very similar qualitative behaviour of 
_R max and M max as a function of n. 

The first striking observation in Fig. |5] is that one obtains with the naive ordering, 
already for moderate degrees n w 30 of the polynomial, large values of -R max and 
^ma X ; this indicates that very large oscillations of the partial products occur in 
the intermediate steps of the construction of the polynomial. Using the naive 
scheme on machines with 32-bit precision or even with 64-bit precision, a safe 
evaluation of the polynomial in the product representation can certainly not be 
guaranteed. 

The behaviour of the values of -R max and M max obtained by using the naive order- 
ing scheme clearly demonstrates the necessity of finding better ordering schemes. 
That such ordering schemes do exist is also demonstrated in Fig. |6]. For n < 100, 
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the values for R msx and M max obtained from the pairing, bit reversal and Mont- 
vay's schemes are close to each other and many orders of magnitude below the 
ones of the naive scheme. However, for n > 120, the values of i? max from these 
ordering schemes also start to deviate from each other. Taking the values of 
_R max and M max as an estimate of the effects of rounding errors arising in the 
construction of polynomials in matrix, it seems that the bit reversal and Mont- 
vay's scheme are the most effective, in reducing these effects, out of the ordering 
schemes investigated here. 



4.2 Quantitative tests for a polynomial in a matrix 

The numerical tests involving a polynomial in the matrix Q, eq . (|35|) , are per- 
formed using a sample of thermalized SU(3) gauge field configurations on 8 3 ■ 16 
lattices. All numerical computations were done on the massively parallel Ale- 
nia Quadrics (APE) machines, which have only 32-bit precision. Simulation 
parameters are chosen to be (3 = 6.8, k = 0.1343 and c sw = 1.42511. They 
correspond to realistic parameter values as actually used in simulations to deter- 
mine values of c sw non-perturbatively Throughout this section we will use 
Schrodinger functional boundary conditions as mentioned in Appendix A. Aver- 
aging over the gauge field configurations, for the above choice of parameters and 
cm = 0.735, the lowest eigenvalue of Q 2 is A m i n = 0.00114(4) and the largest is 
Amax = 0.8721(3). Investigations are performed at values of (n,e) of (16,0.003), 
(32,0.003), (64,0.0022) and (100,0.0022). At each of these values of (n, e) we 
have generated O(50) gauge field configurations. Governed by heuristic argu- 
ments for a real simulation, e should be chosen as e ~ 2A m i n and 5 ~ 0.01, 



which roughly corresponds to the choice (n, e) = (64,0.0022). 

We apply the matrix Q 2 P n ^{Q 2 ), which should be close to the unit matrix for our 

choices of n and e, to a random Gaussian vector Rq and construct the vectors 

border = Q^nAQ 2 )^ , (26) 

where P n ,e{Q 2 ) is defined analogously to eq.(|l~0|) and evaluated using different 
ordering schemes. The subscript "order" stands for naive, pairing, bit rever- 
sal, Montvay and Clenshaw. In the latter case, the polynomial Q 2 P n ,e{Q 2 ) is, 
of course, constructed by using the Clenshaw recursion relation, as explained in 
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section 3.2. Following this prescription, one first constructs the Chebyshev poly- 
nomial T* +1 , eq.@, with s replaced in the obvious way by the matrix Q 2 . Since 
the polynomial we are finally interested in is given by Q 2 P n ,e(Q 2 ) = ^ + pT* +1 , see 
eq.(||), any rounding error that is induced in the construction of T* +l is suppressed 
by a factor of 0(5) since \p\ < 5. 

On a given gauge field configuration and for a given R G we compute 

A = —7=|| border ~ ^Clenshaw || , (27) 

V JM 

where N is the number of degrees of freedom of the vector $ and |.| denotes the 
square root vector norm. 

Since the Clenshaw recurrence is believed to be the numerically most stable 
method to evaluate linear combinations of Chebyshev polynomials, the values 
of A can be interpreted as a measure for the effects of rounding errors in the eval- 
uation of $order- The results for A as a function of n are shown in Fig. ^. Using the 
naive ordering scheme, we could not run the cases of n = 64 and n = 100 because 
of numerical overflows. When adopting the pairing scheme, A takes large values 
for n = 64 and n = 100. The bit-reversal scheme gives small but non-negligible 
values of A for the considered values of n. Finally, Montvay's scheme yields A 
of order 10 -6 for all values of n. We conclude that, somewhat surprisingly, it is 
possible to find ordering schemes through which the construction of the polyno- 
mial Q 2 P n ,e{Q 2 ) can be done with a precision that is comparable with the one 
expected when performing 0(n) multiplications in 32-bit arithmetics. As already 
observed, one may evaluate the corresponding quantity A also for the product 
representation eq. ([IB]) |16[| . In this case again, small values of A « O(10~ 6 ) are 
found when using, however, either Montvay's or bit-reversal ordering schemes. 
This indicates that the improvement achieved through these ordering schemes 
may also depend to some extent on the chosen product representation. 



5 32-bit versus 64-bit precision 

In order to obtain A in eq.(^), the vector <3?cienshaw has been computed using 
solely 32-bit precision. It might be asked therefore, whether A can really be con- 
sidered as a measure of the size of the rounding errors occurring in the construc- 
tion of $ordcr- To the best of our knowledge, the theorem stated in [[H]], providing 
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an upper bound on the rounding errors that occur in the use of the Clenshaw 
recurrence, is not straightforwardly generalizable to the case of a polynomial in 
a matrix. Hence it can not be guaranteed that, in the case of a polynomial in 
a matrix, the Clenshaw recursion is providing us with a reference method to 
quantify the rounding errors on $ rder- In order to clarify this point, we decided 
to compare results obtained from 32-bit precision with those obtained from 64- 
bit precision programs. For this test we considered the SU(2) gauge group with 
the same bare parameters as specified in section 3.3, using three different lattice 
sizes. All the results reported in this section were obtained on a single thermalized 
gauge configuration and hence are given without a statistical error. However, we 
checked explicitly in some cases that using different gauge configurations yields 
only negligible changes in the results discussed here. 
In the following, we denote by 

Xordcr = Q\/c^(Q - r 2n ) ■ ■■■■ y/c^(Q - n)QR G (28) 

the vector obtained with 32-bit precision and by Xorder the corresponding vector 
obtained with 64-bit precision. The subscript "order" labels again different or- 
dering schemes and R G is a random vector obtained from a Gaussian distribution 
with unit variance. Analogously, we denote by xcienshaw and Xcienshaw the vectors 
obtained using the Clenshaw recursion with the two precisions. 
Then the norm of the difference between the vectors Xordcr and Xordcr, 

1 II ~ 

border 7=?||Xordcr Xordcr|| j \ ™) 

V A 

with N the number of degrees of freedom of the vector x, can serve as an esti- 
mate of the rounding errors when Xordor is evaluated. Let us start by comparing 
the values of ?7 or der for the subpolynomial (sp), bit reversal (br) and Montvay's 
ordering schemes, taking n = 64 and e = 0.0015: 

77 sp ~ 3.7 • 10~ 5 , 77 br ~ 4.3 • 10- 6 , r^ ontvay ~ 5.5 • HT 6 . (30) 

The above values, in particular for the bit reversal and Montvay's ordering 
scheme, are close to the precision level that is expected to be reached optimally 
on 32-bit machines after performing In + 2 = 130 multiplications. 
Of particular interest are the values of Xcienshaw Since we expect that the vector 
X is most precisely evaluated when the Clenshaw recurrence relation is used with 
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64-bit precision, the values of r/cienshaw would tell us how much the computation 
of Xcienshaw is affected by rounding errors. The results are shown in table 0. We 
compare also Xordor with Xcienshaw, using the bit-reversal scheme as an example 
for an ordering scheme and evaluate 

1 



Vbr 



I Xbr XClenshaw | 



(31) 



again reporting the results in table [I]. 



Table 1: The quantities r/cienshaw, eq. Q29"D , and r%, r , eq. (PT|) , for various parameters 
of the polynomial P n e . The values in the table are computed using one thermal- 
ized SU(2) gauge configuration at (3 — 1.75, k = 0.165 and c sw = 0. Different 
lattices sizes L 3 ■ T are compared. 



L 3 


•T 


n 
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^Clenshaw 


Vbr 




8 3 - 


16 


16 


0.0215 


0.013 


8.4 


10" 8 


1.6 


10~ 6 


8 3 - 


16 


32 


0.0058 


0.013 


1.3 


10- 7 


2.1 


10~ 6 


8 3 - 


16 


64 


0.0015 


0.013 


2.7 


10- 7 


4.3 


10- 6 


8 3 - 


16 


100 


0.0006 


0.014 


6.0 


10- 7 


9.3 


lO" 6 


4 3. 


8 


64 


0.0015 


0.013 


2.7 


10- 7 


4.6 


10- 6 


6 3. 


16 


64 


0.0015 


0.013 


2.7 


10- 7 


4.5 


10- 6 


8 3 - 


16 


64 


0.0005 


0.109 


2.1 


10" 6 


5.6 


10~ 6 


8 3 - 


16 


64 


0.0001 


0.545 


9.8 


10" 6 


8.7 


10" 6 



For practical values of 5 < 0.014 the Clenshaw recursion relation turns out to 
be at least one order of magnitude more precise than the bit-reversal scheme. In 
fact, the magnitude of the rounding error using the Clenshaw recursion looks in 
some cases even better than what is naively expected from using 32-bit precision. 
This effect is easily explained by the fact that the rounding error is suppressed 
by a factor of 0(5), as discussed in the previous section. In addition, one can 
observe that, as 5 increases, the magnitude of the rounding error, as measured 
by r/cienshaw, also grows and eventually reaches values of the same order as for the 
bit-reversal scheme when S becomes of 0(1). We conclude that for 5 < 0.1 the 
Clenshaw recursion relation allows for a very precise evaluation of Q 2 P n ,e{Q 2 ), 
even when only 32-bit arithmetic is employed. This allows to test the different 
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ordering schemes and to obtain reliable estimates of the rounding errors associated 
to these schemes by using solely 32-bit precision. 

Table [l] also shows that the magnitude of rounding errors, as estimated by r)b r for 
the bit-reversal scheme, increases moderately with the degree n of the polynomial 
(with e tuned to keep S constant), but do not significantly depend on the consid- 
ered lattice sizes. The problem of the lattice size dependence of rounding errors 
has not been systematically investigated. However, on lattices not larger than 
8 3 ■ 16, we could not observe any significant dependence of the rounding errors, 
measured through r\ and f), on the lattice size. This might be explained by the 
fact that Q has a band structure with only the diagonal and some off-diagonal 
matrix elements not vanishing. For this argument to be valid, however, the pre- 
cision losses, characterized by rj, occurring in a single multiplication by Q, should 
be small enough for the propagation of their effect through subsequent multipli- 
cations to be negligible. In practice we have found that when rj reaches a level 
of rj ~ O(10 -4 ), a significant growth of the size of rounding errors, estimated by 
rj itself, can be observed as the lattice volume increases. When such a behaviour 
sets in, it is certainly advisable to switch from 32-bit to 64-bit arithmetics. 

6 Conclusions 

In a class of fermion simulation algorithms, relying on the multiboson technique, 
polynomials in a matrix have to be numerically evaluated. In a number of cases 
the polynomials are needed in their product representation, in order to achieve 
an efficient performance of the algorithms. Moreover, the simulations are often 
done on machines having only 32-bit precision. However, it is well known that, 
in evaluating a polynomial using its product representation, great care has to 
be taken, since rounding errors can easily lead to significant precision losses and 
even numerical instabilities. 

In this paper we investigated the effects of various ordering schemes for the mono- 
mial factors or, equivalently, the complex roots in the numerical construction of a 
polynomial according to a product representation. We found that different order- 
ing schemes can lead to rounding-error effects ranging from numerical overflow 
to retaining a precision comparable to the one that can be provided by the use 
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of numerically stable recurrence relations. 

In the case of a polynomial of a single, real variable s, approximating the function 
1/s, we introduced estimators for the rounding errors, which give qualitative 
information about the level of precision losses occurring when a polynomial in a 
matrix is considered. These estimators indicate that the bit-reversal scheme and 
a scheme suggested by Montvay can keep rounding-error effects to a low level, for 
polynomials in a matrix of order up to n ~ 200. 

Considering cases relevant for numerical simulations of lattice QCD, we studied 
the magnitude of the precision losses affecting the evaluation of polynomials in 
a particular matrix, when adopting different ordering schemes for the monomials 
of the product representation. We found that Montvay's ordering scheme seems 
to be particularly suited for this problem: by adopting this scheme, the rounding 
errors could be kept at a level that is comparable to the one that is reached when 
using the stable Clenshaw recurrence relation. 

As the most important outcome of our investigation, we conclude that there exist 
orderings of the roots that allow a numerically precise evaluation of a polynomial 
in a matrix, even up to degrees n of the polynomial of 0(100). Although in this 
work only a particular matrix, relevant for simulations of lattice QCD, has been 
considered, we think that the main features of our results should hold true for 
several different kinds of matrices and that, also, product representations can be 
used in more general situations while keeping rounding-error effects at a tolerable 
level. 

Acknowledgements 

This work is part of the ALPHA collaboration research programme. We are 
most grateful to S. Sint and U. Wolff for a critical reading of the manuscript. 
We thank DESY for allocating computer time to this project. R.F. thanks the 
Alexander von Humboldt Foundation for the financial support to his research 
stay at DESY-Hamburg, where part of this work was done. 



21 



A Definition of the matrix Q 



An application, where a product representation of polynomials in a matrix is used, 
is the Monte Carlo simulation of fermion systems by the multiboson technique 
and related algorithms. For completeness, we will give in this appendix an explicit 
definition of the matrix Q used for the tests presented in this paper. 
Let us consider a Euclidean space-time lattice with lattice spacing a and size L 3 -T. 
With the lattice spacing set to unity from now on, the points on the lattice have 
integer coordinates (t, Xi, X2, £3). A gauge field U^(x) G SU(N C ) is assigned to 
the link pointing from the site x to the site {x + //), where fi = 0, 1, 2, 3 denotes 
the four forward directions in space-time. In this paper we will use N c = 2, 3. 
For the case of N c = 2, the SU(2) gauge group, the boundary conditions are 
chosen to be periodic in all directions. For N c = 3, the SU(3) gauge group, we 
adopt periodic boundary conditions in space and Schrodinger functional boundary 



conditions in the time direction ||22|| . After integrating out the fermion fields, 
their effects appear in the resulting effective theory through the determinant of 
a matrix A = A[U], depending on the gauge field configuration U. In the case of 
the SU(N C ) gauge theory with rif = 2 mass degenerate quark species, which is 
considered here, we have A = Q 2 , where the matrix Q, characterizing the fermion 
lattice action, is taken as follows: 

Q[U] xy = — 75 [(1 + Y][l:C BV Ka IJ , v J : IM/ (x)])8 Xj y 

^{^-l,)UM^y + ^+l,H^ x -^-^} ■ ( 32 ) 



1-1 



Here k and c sw are parameters that have to be chosen according to the physical 
problem under consideration, Cq = (1 + 8k) _1 , and Cm is a constant serving to 
optimize simulation algorithms. Typically k ~ 1/8 and both c sw and cm are 0(1). 
In order to speed up the Monte Carlo simulation, it is not the original matrix Q 
but an even-odd preconditioned matrix Q that is used. Let us rewrite the matrix 
Q in eq. (|32]) as 

Q^J 1 + Tee Me ° ) , (33) 
cm \ M oe 1+T 00 ) 

where we introduce the matrix T ee (T 00 ) acting on vectors defined on the even 
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(odd) sites: 

(T) xaa , ybp = Y,l^s W ^^(x)5 xy ] . (34) 

[IV 

The off-diagonal parts M eo and M oe connect the even with odd and odd with 
even lattice sites, respectively. Preconditioning is now realized by writing the 
determinant of Q, apart from an irrelevant constant factor, as 

det(Q) oc det(l + T ee )detQ 

Q = -^- 75 (l + T 00 -M oe (l + T ee )- 1 M eo ) . (35) 
cm 

The constant factor Co is given by Co = 1/(1 + 64/t 2 ), and the constant Cm is 
chosen such that the eigenvalues of Q are in the interval [—1,1]. 
The matrices a^ u , fj,,u — 0, ...,3 in the above expressions, are defined by the 
commutator of 7-matrices 

= - hn,lA ■ (36) 



The 4 ® 4 7-matrices are given by 



e, 




(37) 

V e v u / 

with the 2 <g> 2 matrices 

eo = —1 e_j = j = 1, 2, 3 (38) 

and the Pauli-matrices Gj 





The matrix 75 = 7 o7i72 7 3 is thus diagonal 

75 = ( * : 1 < 10) 




* s a tensor antisymmetric under the exchange // <-> 1/ and, for given values 
of /x, z/, an ant i- Hermit ian 7V C <g> iV c matrix depending on the gauge links in the 
vicinity of the lattice site x: 
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= ±[u»(x)U v (x + fi)Ul(x + v)Ut(x) 

+ U v (x)Ul(x + v - fi)Ul(x - /t)C/ M (x - p) 

+ UUx — p)Ul(x — v — fc)U^(x — v — jl)U v {x — v) 

+ Ut(x - 0)U^(x - u)U u (x - + fi)Ul(x) 

- h.c] . 



(41) 
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Figure 1: The behaviour of wi(q) in eq.([T5|) as a function of I (a). The 
three curves correspond to q 2 = e (full line), q 2 = (1 + e)/2 (dotted 
line) and q 2 = 1 (dashed line). In (b) we give Qi(Q) of eq.([T5|) as a 
function of I, by computing fli(Q) once with 32-bit (full line) and once 
with 64-bit (dashed line) arithmetics. 
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Figure 2: The roots Zk, eq. @, with fc = 1, . . . , 16 and e = 0.1 are 
shown in the complex plane. Labels of roots indicate in which order 
they are used for the numerical evaluation of the polynomial, eq. (fLOf), 
within each ordering scheme. We show in a) the naive, b) the pairing, 
c) the bit reversal and d) the Montvay's ordering scheme. 
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Figure 3: The behaviour of (a) u>i(q), eq. ([l5|) , and (b) of Qi(Q), 
eq. ([l5|) , as a function of /. We use the subpolynomial (solid line), the 
bit reversal (dotted line) and Montvay's (dash-dotted line) ordering 
schemes. For Fig. 3a, we have chosen only q = 1. All results shown in 
the figure are obtained using 32-bit precision. 
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Figure 4: -R max , eq. fl2"4|), is shown as a function of s m i n /e, where s m i n is the lower 
end of the interval [s m in, 1] from which the values of s are taken to compute -R max . 
We show data for three polynomials of different degree, n = 30, n = 86 and 
n = 146 at fixed relative fit accuracy 5 = 0.001, using the bit-reversal scheme. 
Although different in magnitude, -R max shows also for the other schemes a very 
similar flat behaviour function of s min /e. 
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Figure 5: -R ma x, eq. fl2"4|), and M max , eq. fl25|), are shown as a function of the 
degree of the polynomial at fixed relative fit accuracy 5 = 0.1. We compare 
subpolynomial and bit-reversal ordering schemes. 
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Figure 6: -R ma x, eq. (p4[), and M max , eq. (p5|), are shown as a function of the 
degree of the polynomial at fixed relative fit accuracy 5 = 0.001. We compare 
naive, pairing, bit reversal and Montvay's ordering schemes. 
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Figure 7: The quantity A, eq. (p7f), is shown as a function of the 
degree n of the polynomial occurring in its definition. We compare the 
naive, pairing, bit reversal (br) and Montvay's (M) ordering schemes. 
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